52 research outputs found
Cross-calibration of Time-of-flight and Colour Cameras
Time-of-flight cameras provide depth information, which is complementary to
the photometric appearance of the scene in ordinary images. It is desirable to
merge the depth and colour information, in order to obtain a coherent scene
representation. However, the individual cameras will have different viewpoints,
resolutions and fields of view, which means that they must be mutually
calibrated. This paper presents a geometric framework for this multi-view and
multi-modal calibration problem. It is shown that three-dimensional projective
transformations can be used to align depth and parallax-based representations
of the scene, with or without Euclidean reconstruction. A new evaluation
procedure is also developed; this allows the reprojection error to be
decomposed into calibration and sensor-dependent components. The complete
approach is demonstrated on a network of three time-of-flight and six colour
cameras. The applications of such a system, to a range of automatic
scene-interpretation problems, are discussed.Comment: 18 pages, 12 figures, 3 table
Automatic Detection of Calibration Grids in Time-of-Flight Images
It is convenient to calibrate time-of-flight cameras by established methods,
using images of a chequerboard pattern. The low resolution of the amplitude
image, however, makes it difficult to detect the board reliably. Heuristic
detection methods, based on connected image-components, perform very poorly on
this data. An alternative, geometrically-principled method is introduced here,
based on the Hough transform. The projection of a chequerboard is represented
by two pencils of lines, which are identified as oriented clusters in the
gradient-data of the image. A projective Hough transform is applied to each of
the two clusters, in axis-aligned coordinates. The range of each transform is
properly bounded, because the corresponding gradient vectors are approximately
parallel. Each of the two transforms contains a series of collinear peaks; one
for every line in the given pencil. This pattern is easily detected, by
sweeping a dual line through the transform. The proposed Hough-based method is
compared to the standard OpenCV detection routine, by application to several
hundred time-of-flight images. It is shown that the new method detects
significantly more calibration boards, over a greater variety of poses, without
any overall loss of accuracy. This conclusion is based on an analysis of both
geometric and photometric error.Comment: 11 pages, 11 figures, 1 tabl
View-based approaches to spatial representation in human vision
In an immersive virtual environment, observers fail to notice the expansion of a room around them and consequently make gross errors when comparing the size of objects. This result is difficult to explain if the visual system continuously generates a 3-D model of the scene based on known baseline information from interocular separation or proprioception as the observer walks. An alternative is that observers use view-based methods to guide their actions and to represent the spatial layout of the scene. In this case, they may have an expectation of the images they will receive but be insensitive to the rate at which images arrive as they walk. We describe the way in which the eye movement strategy of animals simplifies motion processing if their goal is to move towards a desired image and discuss dorsal and ventral stream processing of moving images in that context. Although many questions about view-based approaches to scene representation remain unanswered, the solutions are likely to be highly relevant to understanding biological 3-D vision
Filter Transformations for Shift-Insensitive Feature Detection
International audienceThe representation of oriented image-structure is an important part of most biological vision models. It is possible, for example, to estimate both motion and binocular disparity from the responses of oriented filters (Adelson & Bergen 1985, JOSA A 2(2), 284-299). It is particularly useful to combine the responses of different filters, in order to obtain a response to edge-like structures that is insensitive to slight shifts (in the direction perpendicular to the edge). It has been hypothesized that complex cells achieve this by separating the local energy of the signal from its phase. We describe an alternative approach, which is based on the 'local jet' representation (Koenderink & van Doorn 1987, Biol. Cyb. 55, 367-375). Each jet is computed from a set of oriented derivative filters, of order 1 to N, which are applied at a given image location. We show that these filters can be used as a basis for a new set, which contains filters of a single order, each at a slightly different location. The maximum response, over the new set, is insensitive to small image-shifts. This approach can be justified by noting that a Taylor approximation of the shifted Kth order filter can be obtained from the N-K higher-order filters in the jet. It is shown, however, that a least-squares construction is more practical. Finally, it is noted that the responses of the new filters can be obtained from a linear transformation of the original N image derivatives
Detection and Localization of 3D Audio-Visual Objects Using Unsupervised Clustering
International audienceThis paper addresses the issues of detecting and localizing objects in a scene that are both seen and heard. We explain the benefits of a human-like configuration of sensors (binaural and binocular) for gathering auditory and visual observations. It is shown that the detection and localization problem can be recast as the task of clustering the audio-visual observations into coherent groups. We propose a probabilistic generative model that captures the relations between audio and visual observations. This model maps the data into a common audio-visual 3D representation via a pair of mixture models. Inference is performed by a version of the expectationmaximization algorithm, which is formally derived, and which provides cooperative estimates of both the auditory activity and the 3D position of each object. We describe several experiments with single- and multiple-speaker detection and localization, in the presence of other audio sources
- …